Find here the correspondent *Rmd.
Find here the ggplot2 Cheatsheet.
ggplot2ggplot2 is a powerful and a flexible R package, implemented by Hadley Wickham, for producing elegant graphics.
The concept behind ggplot2 divides plot into three different fundamental parts: Plot = data + Aesthetics + Geometry.
The principal components of every plot can be defined as follow:
data is a data frameAesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, etc…..Geometry defines the type of graphics (histogram, box plot, line plot, density plot, dot plot, ….)There are two major functions in ggplot2 package: qplot() and ggplot() functions:
qplot() stands for quick plot, which can be used to produce easily simple plots.ggplot() function is more flexible and robust than qplot for building a plot piece by piece.Just as the grammar of language helps us construct meaningful sentences out of words, the Grammar of Graphics helps us to construct graphical figures out of different visual elements. This grammar gives us a way to talk about parts of a plot: all the circles, lines, arrows, and words that are combined into a diagram for visualizing data. Originally developed by Leland Wilkinson, the Grammar of Graphics was adapted by Hadley Wickham to describe the components of a plot, including
* Immage reference.
## Install and Load Library
## install & load ggplot library
#install.package("ggplot2")
library("ggplot2")
library(tidyverse)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
In order to create a plot, you:
ggplot() function which creates a blank canvas# create canvas
ggplot(mtcars)
# variables of interest mapped
ggplot(mtcars, aes(x = cyl, y = mpg))
# data plotted
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point()
plot <- ggplot(mtcars, aes(x = cyl, y = mpg))
plot + geom_point()
plot + geom_line()
Note that when you added the geom layer you used the addition (+) operator. As you add new layers you will always use + to add onto your visualization.
The aesthetic mappings take properties of the data and use them to influence visual characteristics, such as position, color, size, shape, or transparency. Each visual characteristic can thus encode an aspect of the data and be used to convey information.
All aesthetics for a plot are specified in the aes() function call (later in this tutorial you will see that each geom layer can have its own aes specification). For example, we can add a mapping from the class of the cars to a color characteristic:
ggplot(mtcars, aes(x = cyl, y = mpg, color = gear)) +
geom_point()
class(mtcars$cyl)
## [1] "numeric"
color <- as.character(mtcars$gear)
ggplot(mtcars, aes(x = cyl, y = mpg, color = color)) +
geom_point()
ggplot(mtcars, aes(x = cyl, y = mpg, color = factor(gear))) +
geom_point()
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point(color = "blue")
ggplot(mtcars, aes(cyl, mpg, color = factor(gear), size = hp)) + #hp = horsepower
geom_point()
Building on these basics, ggplot2 can be used to build almost any kind of plot you may want. These plots are declared using functions that follow from the Grammar of Graphics.
The most obvious distinction between plots is what geometric objects (geoms) they include. ggplot2 supports a number of different types of geoms, including:
geom_point for drawing individual points (e.g., a scatter plot)geom_line for drawing lines (e.g., for a line charts)geom_smooth for drawing smoothed lines (e.g., for simple trends or approximations)geom_bar for drawing bars (e.g., for bar charts)geom_histogram for drawing binned values (e.g. a histogram)geom_polygon for drawing arbitrary shapesgeom_map for drawing polygons in the shape of a map! (You can access the data to use for these maps by using the map_data() function).# Left column: x and y mapping needed!
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point()
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_line()
# plot with both points and smoothed line
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point() +
geom_smooth(method = "lm")
ggplot(mtcars, aes(x = cyl, y = mpg, color=factor(gear))) +
geom_point() +
geom_smooth(method = "lm", color = "red")
# color aesthetic passed to each geom layer
ggplot(mtcars, aes(x = cyl, y = mpg, color = cyl)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
# color aesthetic specified for only the geom_point layer
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point(aes(color = factor(cyl))) +
geom_smooth(method = "lm", se = FALSE)
# Right column: no y mapping needed!
ggplot(data = mtcars, aes(x = gear)) +
geom_bar()
ggplot(data = mtcars, aes(x = gear)) +
geom_histogram()
ggplot(data = iris, aes(x = Sepal.Length)) +
geom_histogram()
ggplot(mtcars, aes(factor(gear), mpg)) +
geom_violin()
ggplot(mtcars, aes(factor(gear), mpg)) +
geom_boxplot() +
geom_point()
ggplot(mtcars, aes(factor(gear), mpg)) +
geom_violin() +
geom_point(shape = 1, position = "jitter")
If you look at the below bar chart, you’ll notice that the the y axis was defined for us as the count of elements that have the particular type. This count isn’t part of the data set (it’s not a column in mpg), but is instead a statistical transformation that the geom_bar automatically applies to the data. In particular, it applies the stat_count transformation.
ggplot(mtcars, aes(x = gear)) +
geom_bar()
class_count <- dplyr::count(mpg, class)
class_count
## # A tibble: 7 x 2
## class n
## <chr> <int>
## 1 2seater 5
## 2 compact 47
## 3 midsize 41
## 4 minivan 11
## 5 pickup 33
## 6 subcompact 35
## 7 suv 62
ggplot(mpg, aes(x = hwy)) +
geom_bar()
ggplot(class_count, aes(x = class, y = n)) +
geom_bar(stat = "identity")
ggplot(class_count, aes(x = class, y = n, fill=class)) +
geom_bar(stat = "identity")
We can also call stat_ functions directly to add additional layers. For example, here we create a scatter plot of highway miles for each displacement value and then use stat_summary to plot the mean highway miles at each displacement value.
ggplot(mpg, aes(displ, hwy)) +
geom_point(color = "grey") +
stat_summary(fun = "mean", geom = "line", size = 1, linetype = "dashed")
In addition to a default statistical transformation, each geom also has a default position adjustment which specifies a set of “rules” as to how different components should be positioned relative to each other. This position is noticeable in a geom_bar if you map a different variable to the color visual characteristic:
# bar chart of class, colored by drive (front, rear, 4-wheel)
ggplot(ToothGrowth, aes(x = supp, y=len, fill = dose)) +
geom_bar(stat = "identity")
ggplot(ToothGrowth, aes(x = supp, y=len, fill = factor(dose))) +
geom_bar(stat = "identity")
The geom_bar by default uses a position adjustment of “stack”, which makes each rectangle’s height proprotional to its value and stacks them on top of each other. We can use the position argument to specify what position adjustment rules to follow:
# position = "dodge": values next to each other
ggplot(mpg, aes(x = class, fill = drv)) +
geom_bar(position = "dodge")
# position = "fill": percentage chart
ggplot(mpg, aes(x = class, fill = drv)) +
geom_bar(position = "fill")
Whenever you specify an aesthetic mapping, ggplot uses a particular scale to determine the range of values that the data should map to. Thus when you specify
# color the data by engine type
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point()
ggplot automatically adds a scale for each mapping to the plot:
# same as above, with explicit scales
ggplot(economics, aes(date, unemploy)) +
geom_point() +
scale_y_continuous() +
scale_colour_discrete()
ggplot(economics, aes(date, unemploy)) +
geom_line() +
scale_y_continuous(limits = c(5000, max(economics$unemploy)))
## Warning: Removed 41 row(s) containing missing values (geom_path).
ggplot(economics, aes(date, unemploy)) +
geom_line() +
scale_y_continuous(limits = c(0, max(economics$unemploy))) +
scale_x_date(limits = c(as.Date("2000-01-01"), as.Date(Sys.time())))
## Warning: Removed 390 row(s) containing missing values (geom_path).
Each scale can be represented by a function with the following name: scale_, followed by the name of the aesthetic property, followed by an _ and the name of the scale. A continuous scale will handle things like numeric data (where there is a continuous set of numbers), whereas a discrete scale will handle things like colors (since there is a small list of distinct colors).
While the default scales will work fine, it is possible to explicitly add different scales to replace the defaults. For example, you can use a scale to change the direction of an axis:
# milage relationship, ordered in reverse
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
scale_x_reverse() +
scale_y_reverse()
A common parameter to change is which set of colors to use in a plot. While you can use the default coloring, a more common option is to leverage the pre-defined palettes from colorbrewer.org. These color sets have been carefully designed to look good and to be viewable to people with certain forms of color blindness. We can leverage color brewer palletes by specifying the scale_color_brewer() function, passing the pallete as an argument.
# default color brewer
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
scale_color_brewer()
# specifying color palette
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
scale_color_brewer(palette = "Set3")
ggplot(mpg, aes(displ, hwy, color = class)) +
geom_point() +
scale_color_hue(h = c(270, 360)) # blue to red
[colorbrewer.org]](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3)
ggplot(midwest, aes(area, poptotal)) +
geom_point() +
scale_y_log10()
mpg %>%
group_by(class) %>%
summarize(maxhwy = max(hwy)) %>%
ggplot(aes(class, maxhwy)) +
geom_col() +
scale_x_discrete(labels = toupper(sort(unique(mpg$manufacturer))))
ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot()
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
geom_boxplot()
ggplot(ToothGrowth, aes(x=factor(dose), y=len)) +
geom_boxplot() +
scale_x_discrete(name ="Dose (mg)", limits=c("1","2","0.5"))
Facets are ways of grouping a data plot into multiple different pieces (subplots). This allows you to view a separate plot for each value in a categorical variable. You can construct a plot with multiple facets by using the facet_wrap() function. This will produce a “row” of subplots, one for each categorical variable (the number of rows can be specified with an additional argument):
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(~ year)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(~ manufacturer)
You can also facet_grid to facet your data by more than one categorical variable. Note that we use a tilde (~) in our facet functions. With facet_grid the variable to the left of the tilde will be represented in the rows and the variable to the right will be represented across the columns.
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
theme(axis.title.y = element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
Textual labels and annotations (on the plot, axes, geometry, and legend) are an important part of making a plot understandable and communicating information. Although not an explicit part of the Grammar of Graphics (the would be considered a form of geometry), ggplot makes it easy to add such annotations.
You can add titles and axis labels to a chart using thelabs() function (not labels, which is a different R function!):
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
labs(title = "Fuel Efficiency by Engine Power",
subtitle = "Fuel economy data from 1999 and 2008 for 38 popular models of cars",
x = "Engine power (litres displacement)",
y = "Fuel Efficiency (miles per gallon)",
color = "Car Type")
ggplot2 offers us a very highly level of customizability in the theme function and pre-set themes.
ggplot(mpg, aes(displ, hwy, color = class)) +
geom_point() +
theme_classic()
ggplot(mpg, aes(displ, hwy, color = class)) +
geom_point() +
theme_classic() +
theme(legend.position = "bottom",
legend.background = element_rect(fill = "#EEEEEE", color = "black"),
legend.title = element_blank(),
axis.title = element_text(size = 16))
ggplot(mpg, aes(displ, hwy, color = class)) +
geom_point() +
theme_classic() +
theme(legend.position = c(1, 1),
legend.justification = c(1,1),
legend.direction = "horizontal",
legend.title = element_blank()) +
xlab("Engine Displacement") +
ylab("Highway Fuel Economy (miles / gallon") +
ggtitle("Highway fuel economy versus engine displacement",
"or why do you need that big truck again? ")
It is also possible to add labels into the plot itself (e.g., to label each point or line) by adding a new geom_text or geom_label to the plot; effectively, you’re plotting an extra set of data which happen to be the variable names:
# a data table of each car that has best efficiency of its type
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_label(data = best_in_class, aes(label = model), alpha = 0.5)
However, note that two labels overlap one-another in the top left part of the plot. We can use the geom_text_repel function from the ggrepel package to help position labels.
library(ggrepel)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_text_repel(data = best_in_class, aes(label = model))
Let’s save that great plot we just made. Saving plots in ggplot is done with the ggsave() function:
ggsave("hwy_vs_displ.png")
ggsave("hwy_vs_displ.png", width = 6, height = 6)
# Get data
library(gapminder)
# Charge libraries
library(ggplot2)
library(gganimate)
# Make a ggplot, but add frame=year: one image per year
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point() +
scale_x_log10() +
theme_bw() +
# gganimate specific bits
labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
transition_time(year) +
ease_aes('linear')
# Save at gif
anim_save("271-ggplot2-animated-gif-chart-with-gganimate1.gif")
# Load the library
library(leaflet)
# Note: if you do not already installed it, install it with:
# install.packages("leaflet")
# Background 1: NASA
m <- leaflet() %>%
addTiles() %>%
setView( lng = 2.34, lat = 48.85, zoom = 5 ) %>%
addProviderTiles("NASAGIBS.ViirsEarthAtNight2012")
m
# Background 2: World Imagery
m <- leaflet() %>%
addTiles() %>%
setView( lng = 2.34, lat = 48.85, zoom = 3 ) %>%
addProviderTiles("Esri.WorldImagery")
m
# save the widget in a html file if needed.
library(htmlwidgets)
saveWidget(m, file="backgroundMapTile.html")